78 research outputs found
Predicate Abstraction with Indexed Predicates
Predicate abstraction provides a powerful tool for verifying properties of
infinite-state systems using a combination of a decision procedure for a subset
of first-order logic and symbolic methods originally developed for finite-state
model checking. We consider models containing first-order state variables,
where the system state includes mutable functions and predicates. Such a model
can describe systems containing arbitrarily large memories, buffers, and arrays
of identical processes. We describe a form of predicate abstraction that
constructs a formula over a set of universally quantified variables to describe
invariant properties of the first-order state variables. We provide a formal
justification of the soundness of our approach and describe how it has been
used to verify several hardware and software designs, including a
directory-based cache coherence protocol.Comment: 27 pages, 4 figures, 1 table, short version appeared in International
Conference on Verification, Model Checking and Abstract Interpretation
(VMCAI'04), LNCS 2937, pages = 267--28
Towards Generating Functionally Correct Code Edits from Natural Language Issue Descriptions
Large language models (LLMs), such as OpenAI's Codex, have demonstrated their
potential to generate code from natural language descriptions across a wide
range of programming tasks. Several benchmarks have recently emerged to
evaluate the ability of LLMs to generate functionally correct code from natural
language intent with respect to a set of hidden test cases. This has enabled
the research community to identify significant and reproducible advancements in
LLM capabilities. However, there is currently a lack of benchmark datasets for
assessing the ability of LLMs to generate functionally correct code edits based
on natural language descriptions of intended changes. This paper aims to
address this gap by motivating the problem NL2Fix of translating natural
language descriptions of code changes (namely bug fixes described in Issue
reports in repositories) into correct code fixes. To this end, we introduce
Defects4J-NL2Fix, a dataset of 283 Java programs from the popular Defects4J
dataset augmented with high-level descriptions of bug fixes, and empirically
evaluate the performance of several state-of-the-art LLMs for the this task.
Results show that these LLMS together are capable of generating plausible fixes
for 64.6% of the bugs, and the best LLM-based technique can achieve up to
21.20% top-1 and 35.68% top-5 accuracy on this benchmark
Verification modulo versions: Towards usable verification
Abstract We introduce Verification Modulo Versions (VMV), a new static analysis technique for reducing the number of alarms reported by static verifiers while providing sound semantic guarantees. First, VMV extracts semantic environment conditions from a base program P. Environmental conditions can either be sufficient conditions (implying the safety of P) or necessary conditions (implied by the safety of P). Then, VMV instruments a new version of the program, P , with the inferred conditions. We prove that we can use (i) sufficient conditions to identify abstract regressions of P w.r.t. P; and (ii) necessary conditions to prove the relative correctness of P w.r.t. P. We show that the extraction of environmental conditions can be performed at a hierarchy of abstraction levels (history, state, or call conditions) with each subsequent level requiring a less sophisticated matching of the syntactic changes between P and P. Call conditions are particularly useful because they only require the syntactic matching of entry points and callee names across program versions. We have implemented VMV in a widely used static analysis and verification tool. We report our experience on two large code bases and demonstrate a substantial reduction in alarms while additionally providing relative correctness guarantees
Mutual Summaries: Unifying Program Comparison Techniques
Abstract. In this paper, we formalize mutual summaries as a contract mechanism for comparing two programs, and provide a method for checking such contracts modularly. We show that mutual summary checking generalizes equivalence checking, conditional equivalence checking and translation validation. More interestingly, it enables comparing programs where the changes are interprocedural. We have prototyped the ideas in SymDiff, a Boogie based language-independent infrastructure for comparing programs
Zapato: Automatic theorem proving for
Counterexample-driven abstraction refinement is an automatic process that produces abstract models of finite and infinite-state systems. When this process is applied to software, an automatic theorem prover for quantifier-free first-order logic helps to determine the feasibility of program paths and to refine the abstraction. In this paper we report on a fast, lightweight, and automatic theorem prover called Zapato which we have built specifically to solve the queries produced during the abstraction refinement process
Fault-Aware Neural Code Rankers
Large language models (LLMs) have demonstrated an impressive ability to
generate code for various programming tasks. In many instances, LLMs can
generate a correct program for a task when given numerous trials. Consequently,
a recent trend is to do large scale sampling of programs using a model and then
filtering/ranking the programs based on the program execution on a small number
of known unit tests to select one candidate solution. However, these approaches
assume that the unit tests are given and assume the ability to safely execute
the generated programs (which can do arbitrary dangerous operations such as
file manipulations). Both of the above assumptions are impractical in
real-world software development. In this paper, we propose CodeRanker, a neural
ranker that can predict the correctness of a sampled program without executing
it. Our CodeRanker is fault-aware i.e., it is trained to predict different
kinds of execution information such as predicting the exact compile/runtime
error type (e.g., an IndexError or a TypeError). We show that CodeRanker can
significantly increase the pass@1 accuracy of various code generation models
(including Codex, GPT-Neo, GPT-J) on APPS, HumanEval and MBPP datasets.Comment: In the proceedings of Advances in Neural Information Processing
Systems, 202
- …